Add internal repartition metrics#21152
Conversation
|
I have some concerns about these low-level (kernel-profiling) metrics, so I’m sharing a few suggestions. (Not trying to block this, given this is useful to solve real problems—just offering additional perspectives and possible improvements.) Metrics are typically used for query tuning at the application level, while these low-level ones are mainly for internal debugging and are less frequently used. They may also introduce execution overhead that’s hard to observe, and bring maintenance overhead. In general, it might be better to keep metrics that directly help application tuning, are frequently used, or are difficult to capture with external profilers. I suspect some of them can be directly observed with profilers/flamegraphs, maybe they can be simplified? Additionally, we could consider introducing a new analyze level |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpch — base (merge-base)
tpch — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Ya I agree and am trying to be wary about this as well. I don't not see any changes on benches but like may be hard to catch microregressions. I am using these and they are quite useful (and very convenient compared to a profiling tool like xctrace). Maybe these could be kept at the 'dev' level of analyze output? |
I was thinking I'll open an issue for this idea. |
hey following up on this, I wasn't able to find an issue but I would be willing to for these metrics. I think they are worth the additional level 👍 |
|
@gene-bordegaray Thanks, really appreciate it! Please go ahead. |
cd5514f to
769ad4f
Compare
769ad4f to
fe6c854
Compare
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
fe6c854 to
1b1ee57
Compare
1b1ee57 to
5924eaf
Compare
|
@2010YOUY01 I have added this behind an 'internal' metrics level. I decided to keep this in this PR as the introduction of the "internal" metrics level and repartition being the first use case seemed fitting and not too large of a scope. These metrics have led me to find an improvement in the oeprator on fanned out small batches here: #22159 Let me know what you think on the way it is introduced, also not torn to do it in two PRs if preferred |
Which issue does this PR close?
Rationale for this change
RepartitionExecspends time in different phases, but the existing metrics only expose broad timings. More granular timings are useful when investigating repartition bottlenecks without adding defaultEXPLAIN ANALYZEnoise or runtime overhead.What changes are included in this PR?
internaldatafusion.explain.analyze_levelfor low-level development metrics.MetricBuilder::if_enabled.RepartitionExectiming metrics as the first consumer of this internal metrics level.AnalyzeExecmetric types through physical plan protobuf serialization and uses them during execution so metrics are registered consistently.Are these changes tested?
Yes. Added coverage for conditional metric registration,
RepartitionExecinternal metric registration,EXPLAIN ANALYZEoutput, and physical plan protobuf roundtrip behavior.Are there any user-facing changes?
Yes. Users can set
datafusion.explain.analyze_level = 'internal'to include low-level repartition timing metrics inEXPLAIN ANALYZE. The defaultdevoutput is unchanged.